Noam Ross (with lots of slides by David L Miller)
August 5th, 2018
Models that look like:
\[ y_i = \beta_0 + x_{1i}\beta_1 + x_{2i}\beta_2 + \ldots + \epsilon_i \]
(describe the response, \( y_i \), as linear combination of the covariates, \( x_{ji} \), with an offset)
We can make \( y_i\sim \) any exponential family distribution (Normal, Poisson, etc).
Error term \( \epsilon_i \) is normally distributed (usually).
lm(y ~ x1, data=dat)
lm(y ~ x1 + poly(x1, 2), data=dat)
(Sort of. Let's fire up RStudio)
In GLM we model the mean of data as a sum of linear terms:
\[ y_i = \beta_0 +\sum_j \color{red}{ \beta_j x_{ji}} +\epsilon_i \]
A GAM is a sum of smooth functions or smooths
\[ y_i = \beta_0 + \sum_j \color{red}{s_j(x_{ji})} + \epsilon_i \]
where \( \epsilon_i \sim N(0, \sigma^2) \), \( y_i \sim \text{Normal} \) (for now)
Call the above equation the linear predictor in both cases.
Several different types of splines, with different applications
Basis functions of cubic spline (top), and thin-plate spline (bottom)
Back RStudio to look at this in our model!
\[ \int_\mathbb{R} \left( \frac{\partial^2 f(x)}{\partial^2 x}\right)^2 \text{d}x = \boldsymbol{\beta}^\text{T}S\boldsymbol{\beta} = \large{W} \]
(Wigglyness is 100% the right mathy word)
We penalize wiggliness to avoid overfitting.
\[ \log(\text{Likelihood}) - \lambda W \]
Hence gam(..., method="REML")
\[ \text{EDF} < k \]
s() terms with s(variable, k=n)\[ \text{count}_i = A_i \exp \left( \beta_0 + s(x_i, y_i) + s(\text{Depth}_i)\right) \]
Without uncertainty, we're not doing statistics
All the model coefficients together have approximately a multi-normal distribution around the mean:
\[ \boldsymbol{\beta} \sim N(\hat{\boldsymbol{\beta}}, \mathbf{V}_\boldsymbol{\beta}) \]
However, this is a distribution dependent on the smoothing parameter.
In mgcv, vcov(model) returns \( \mathbf{V}_\boldsymbol{\beta} \), the variance-covariance
matrix of the coefficients.
vcov(mode, unconditional = TRUE) corrects for uncertainty in the smoothing parameter.
plotse.fit in the predict() function